Variant Discovery ◾ 155
-downdb \
-webfrom annovar dbnsfp30a humandb/
The database files are downloaded into the specified directory “humandb”. Save the anno-
tation databases of each organism in a separate file.
Not all non-human organisms have annotation databases. In this case, you can build an
annotation database for any organism by yourself. The following steps show how to build a
gene-based annotation database. As an example, we will build an annotation database for
SARS-CoV-2 and we will use it later to annotate the variants called in a previous example.
The following are the steps to build SARS-CoV-2 gene-based annotation database:
1. Download the reference genome sequence of the organism in FASTA format and the
sequence annotation file in GFF/GTF format. For SARS-CoV-2, we can download both
files from the NCBI Genome database at
https://www.ncbi.nlm.nih.gov/genome/86693?genome_assembly_id=757732
Use the following commands to create a directory “sarscov2db” and download the ref-
erence FASTA file and GFF file into it:
mkdir sarscov2db
cd sarscov2db
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/
GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.
fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/
GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.
gff.gz
Then, decompress the two files with “gunzip” command:
gunzip GCF_009858895.2_ASM985889v3_genomic.fna.gz
gunzip GCF_009858895.2_ASM985889v3_genomic.gff.gz
2. Use the “gff3ToGenePred” tool to convert the GFF file to GenePred file, which is a file
format used to specify the gene track annotations for an imported genome. For GFT for-
mat, use “gtfToGenePred” to convert it into GenePred file. Both “gff3ToGenePred” and
“gtfToGenePred” are ones of the UCSC Genome Browser application binaries built for
standalone command-line use on Linux and UNIX platforms. They can be downloaded by
choosing the right platform at “http://hgdownload.soe.ucsc.edu/admin/exe/”. For the sake
of simplicity, we can download “gff3ToGenePred” in the same “sarscov2db” directory and
use “chmod” to allow it to run as a program:
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/
gff3ToGenePred
chmod +x gff3ToGenePred
If you wish to download all UCSC Genome Browser binaries, run the following: